# install.packages(c(dplyr, tidyverse, sf, nngeo, leaflet, plotly, DT, zoo, epitools))
library(dplyr) # data wrangling with pipe syntax
library(tidyverse) # data wrangling
library(sf) # simple features geometry data
library(nngeo) # perform nearest neighbor analysis
library(leaflet) # map making
library(plotly) # making charts
library(DT) # making tables
library(zoo) # data analysis
library(epitools) # data wrangling
options(scipen=999, digits = 2) # format output for data tablesGEO 3/330 Exercise #3
Evaluate Trends in L Station Boardings by Community COVID Vulnerability
Purpose
This project examines overall trends in CTA L ridership between 1999 and 2023 and identifies how the COVID-19 pandemic may have influenced train travel–both generally and across specific lines (e.g., Green, Pink, Red) and neighborhood categories–in the City of Chicago throughout the various waves of the pandemic. The sections below step you through the process for creating tables, maps and figures to inform your presentation. Download the exercise #3 presentation template to organize data elements and summaries. Submit the completed presentation to the appropriate submission folder on D2L.
Step 1. Create a new R project
This exercise will make use of both RStudio and QGIS, which are available on all DePaul computers. (See download and install instructions below if you are planning to use your own laptop.) In RStudio, create a project within a new directory (e.g., GEO330/exercise_03) of a course folder on your computer. The new directory will serve as your project working directory where you will store files related to exercise #3. Create the following folders within the new exercise-specific directory: (1) “data”, “images”, and “scripts”.
R is a free software environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows and MacOS. You can download and install R via the R Archive Network web pages.
RStudio is an integrated development environment (IDE) for R. It has a set of tools built to help R users be more productive with R (and other languages including Python)). It includes a console, syntax-highlighting editor that supports direct code execution. It also features tools for plotting, viewing history, debugging and managing your workspace. You can download RStudio via the Posit website.
QGIS is an Open Source Geographic Information System (GIS) licensed under the GNU General Public License. QGIS is an official project of the Open Source Geospatial Foundation (OSGeo). The program runs on Linux, Unix, Mac OSX, Windows and Android and supports numerous vector, raster, and database formats and functionalities. You can download QGIS via the QGIS download web page
Step 2. Create a new R script
In the “Files” tab of your RStudio interface (conventionally located on the bottom left screen), navigate to the “scripts” folder in your working directory. Create a new R script in the “scripts” directory and assign it an intuitive name (e.g., “Exercise_03.R”). You will use this script to manage and execute code all code for this exercise.
Step 3. Install and activate R packages
We will use several R packages to complete the exercise, including “dplyr”, “tidyverse” and “sf”. Copy and paste the following code to activate the associated packages in R. It may also be necessary to install the packages prior to activating. If so, uncomment the installation function.
Step 4. Download and extract geographic data
This exercise evaluates L ridership and COVID-19 trends for neighborhoods in the city of Chicago. To do this, we need to download zip codes, community areas and L stations in a spatial (i.e., mappable) format (e.g., shapefile, GEOJSON). These datasets, and all data for this exercise, are publicly available via the Chicago data portal and the Regional Transportation Authority’s Mapping and Statistics (RTAMS) websites.
Download city of Chicago community area and zip code data
Use the following code to import community areas and zip codes directly from the Chicago Data Portal into RStudio.
# import City of Chicago community areas (N=77) GEOJSON from Chicago data portal and transform into standard geographic projection (epsg:3435)
community_areas <- st_read("https://data.cityofchicago.org/api/geospatial/cauq-8yn6?method=export&format=GeoJSON") %>%
st_transform(3435) %>%
mutate(`comarea_id` = as.numeric(area_numbe),
comarea_name = str_to_title(community)) %>%
select(comarea_id, comarea_name)
zip_codes <- st_read("https://data.cityofchicago.org/api/geospatial/gdcf-axmw?method=export&format=GeoJSON") %>%
st_transform(3435) %>%
select(zip)Download and extract L stations from RTAMS
Navigate to the Shapefiles and Feature Layers section of the RTAMS Maps and Data web page to download CTA Rail Stations and CTA Rail Lines in shapefile format. Extract the shapefiles (i.e., all files with prefix CTA_RAIL_STATIONS and CTA_RAIL_LINES) into the “data” directory of your project folder. Once the shapefiles are extracted run the following code to import the CTA rail stations into RStudio. Later in the exercise, you will also map select rail lines in QGIS.
# import CTA rail stations shapefile (downloaded from RTAMS)
rail_stations <- st_read("data/CTA_RAIL_STATIONS.shp") %>%
select(station_id = STATION_ID,
station_name = SHORTNAME,
station_name_long = LONGNAME,
station_line = LEGEND)Attribute (via spatial join) L stations with community area, zip codes
It will be necessary to attribute L stations with community areas and zip codes in order to evaluate CCVI, COVID-19 cases with ridership trends. That is, ridership is reported by L station whereas CCVI and COVID-19 cases are reported by community area and zip code, respectively. Therefore it is necessary to associate each L station with its (nearest) community area and zip code. Copy the following code to your script in RStudio to perform spatial joins and nearest neighbor analyses with these three geographic datasets.
# remove stations outside of the city of Chicago
rail_stations_chicago <- rail_stations %>%
st_join(community_areas) %>%
drop_na(comarea_id) %>%
select(station_id:station_line)
# stations with nearest community area
stations_ca <- st_join(rail_stations_chicago,
community_areas,
join = nngeo::st_nn,
k = 1,
progress = FALSE)
# stations with nearest zip code
stations_zip <- st_join(rail_stations_chicago,
zip_codes,
join = nngeo::st_nn,
k = 1,
progress = FALSE)
# combine above into one stations file with both community area and zip code attributes
rail_stations <- rail_stations_chicago %>%
left_join(stations_ca %>%
st_drop_geometry() %>%
select(station_id,comarea_id,comarea_name),
by="station_id") %>%
left_join(stations_zip %>%
st_drop_geometry() %>%
select(station_id,zip),
by="station_id")
# remove individual files
rm(stations_ca,stations_zip, rail_stations_chicago) Figure 1 below tabulates L stations by community area and zip code.
Create CCVI by Chicago community area dataset
We will be evaluating L ridership based on a variety of community characteristics included in the Chicago COVID-19 Community Vulnerability Index (CCVI). The index was adapted and modified from Surgo Ventures and the CDC Social Vulnerability Index to identify communities (i.e., both community areas and zip codes) in Chicago that have been disproportionately impacted by COVID-19. Vulnerability is defined as a combination of: sociodemographic factors; epidemiological factors; occupational factors and other factors related to cumulative COVID burden. These factors are combined and weighted to create a single composite CCVI score. The higher the score, the more vulnerable the geographic area or, in this case, community area. Copy and paste the following code into your exercise R script to import the datasets.
# Chicago COVID-19 Community Vulnerability Index (CCVI) from Chicago data portal by community area
ccvi_attributes <- read_csv("https://data.cityofchicago.org/api/views/rqqg-u7zt/rows.csv?accessType=DOWNLOAD") %>%
select(-c(`Geography Type`,`Location`)) %>%
rename(comarea_id = 1)
# join community areas with CCVI attributes to create custom dataset
# add one to rank values to adjust range from 0-76 to 1-77
# sort/arrange table by CCVI Score
community_areas_ccvi <- community_areas %>%
left_join(ccvi_attributes, by = "comarea_id") %>%
mutate_at(
vars(
`Rank - Socioeconomic Status`:`Rank - COVID-19 Crude Mortality Rate`
),
list(
ntile = function(x, na.rm = TRUE)
(ntile(x, 5)
))
) %>%
mutate_at(
vars(
`Rank - Socioeconomic Status`:`Rank - COVID-19 Crude Mortality Rate`
),
function(x,na.rm=TRUE)(x+1)
) %>%
arrange(`CCVI Score`)Create map showing CCVI distribution by community area
R has several packages devoted to map making. The leaflet package can be used to create custom, interactive maps (Figure 2). Modify the color palette (see this reference for palette names) in the interactive map below to create a custom figure showing variations by CCVI category. Insert the map into slide 1 of your presentation.
# create interactive map using leaflet
# color palette
map_pal <- colorFactor(palette = "YlOrRd", # EDIT HERE
domain = community_areas_ccvi$`CCVI Category`,
ordered = TRUE)
# hover labels
communityarea_labels <- sprintf(
"<strong>%s</strong><br/>
<i>CCVI Score:</i> %0.1f<br/>
<i>CCVI Category:</i> %s<br/><br/>
<em>Ranked scores from least (1) to most (77) vulnerable:</em><br/>
<ul style=\"list-style-type:square;\">
<li>Socioeconomic Status: %1.0f</li>
<li>Essential Workers: %1.0f</li>
<li>Mobility Ratio: %1.0f</li>
<li>COVID-19 Incidence Rate: %1.0f</li>
<li>COVID-19 Hospitalization Rate: %1.0f</li>
<li>COVID-19 Crude Mortality Rate: %1.0f</li>",
community_areas_ccvi$comarea_name,
community_areas_ccvi$`CCVI Score`,
community_areas_ccvi$`CCVI Category`,
community_areas_ccvi$`Rank - Socioeconomic Status`,
community_areas_ccvi$`Rank - Frontline Essential Workers`,
community_areas_ccvi$`Rank - Cumulative Mobility Ratio`,
community_areas_ccvi$`Rank - COVID-19 Incidence Rate`,
community_areas_ccvi$`Rank - COVID-19 Hospital Admission Rate`,
community_areas_ccvi$`Rank - COVID-19 Crude Mortality Rate`) %>%
lapply(htmltools::HTML)
# interactive map
leaflet(data = community_areas_ccvi %>% st_transform(crs=4326)) %>%
addProviderTiles(providers$CartoDB.Positron) %>%
addPolygons(
fillColor = ~map_pal(`CCVI Category`),
label = communityarea_labels,
weight = 2,
opacity = 1,
color = "white",
dashArray = "3",
fillOpacity = 0.7) %>%
addLegend("bottomleft",
pal = map_pal,
values = ~`CCVI Category`,
title = "CCVI Category",
opacity = 1)Write community areas shapefile
Use the following code to write the attributed community areas layer as a shapefile to your “data” directory. The code abbreviates the longer column names to ease mapping in QGIS.
# subset data frame and abbreviate column names
community_areas_ccvi_abbv <- community_areas_ccvi %>%
select(-c(comarea_id, `Community Area Name`)) %>%
select(
comname = `comarea_name`,
ccviscore = `CCVI Score`,
ccvicat = `CCVI Category`,
ses = `Rank - Socioeconomic Status`,
hhcomp = `Rank - Household Composition and Disability`,
nopcp = `Rank - Adults with no PCP`,
mobrat = `Rank - Cumulative Mobility Ratio`,
essen = `Rank - Frontline Essential Workers`,
age65 = `Rank - Age 65+`,
comcon = `Rank - Comorbid Conditions`,
covinc = `Rank - COVID-19 Incidence Rate`,
covhsp = `Rank - COVID-19 Hospital Admission Rate`,
covmor = `Rank - COVID-19 Crude Mortality Rate`,
ses_n = `Rank - Socioeconomic Status_ntile`,
hhcomp_n = `Rank - Household Composition and Disability_ntile`,
nopcp_n = `Rank - Adults with no PCP_ntile`,
mobrat_n = `Rank - Cumulative Mobility Ratio_ntile`,
essen_n = `Rank - Frontline Essential Workers_ntile`,
age65_n = `Rank - Age 65+_ntile`,
comcon_n = `Rank - Comorbid Conditions_ntile`,
covinc_n = `Rank - COVID-19 Incidence Rate_ntile`,
covhsp_n = `Rank - COVID-19 Hospital Admission Rate_ntile`,
covmor_n = `Rank - COVID-19 Crude Mortality Rate_ntile`
)
# clean data frame prior to conversion
community_areas_ccvi_cleaned <- community_areas_ccvi_abbv %>%
st_as_sf()
# write shapefile to layers folder
st_write(community_areas_ccvi_cleaned, "data/community_areas_ccvi.shp", append = FALSE)Deleting layer `community_areas_ccvi' using driver `ESRI Shapefile'
Writing layer `community_areas_ccvi' to data source
`data/community_areas_ccvi.shp' using driver `ESRI Shapefile'
Writing 77 features with 23 fields and geometry type Multi Polygon.
Step 5. Evaluate CTA L ridership data before and during the COVID-19 pandemic
Create ridership by L station dataset
The Regional Transportation Authority provides information on the location and performance of the three public transit operators within metropolitan Chicago, namely, PACE Suburban Bus, Chicago Transit Authority (CTA both bus and rail), and Metra commuter rail. The Chicago Data Portal also serves up daily boardings for CTA. For this exercise we focus specifically on CTA rail ridership. Copy and paste the following code to create two tables: (1) daily rail boardings for all L stations (from Chicago Data Portal); and (2) monthly boardings by L station (and line) (from RTAMS).
# import daily boardings from the Chicago data portal
ridership_daily_boardings <- read_csv("https://data.cityofchicago.org/api/views/6iiy-9s97/rows.csv?accessType=DOWNLOAD") %>%
mutate(service_date = as.Date(service_date, "%m/%d/%Y")) %>%
distinct() %>%
arrange(service_date)
# import monthly boardings by rail station
ridership_attributes <- read_csv("https://www.rtams.org/sites/default/files/documents/2024-02/CTA_Average_Rail_Station_Ridership_1999_2023.csv") %>%
select (station_id = `RIDERSHIP_ID`,
station_name = NAME,
year = YEAR,
month = MONTH,
day_type = DAY_TYPE,
rides_total = TOTAL_RIDES,
rides_avg = DAILY_AVG_RIDES) %>%
mutate(year_month = as.Date(paste0(as.character(year),"-",month,"-01"), "%Y-%m-%d"))
# create boardings per month (all station) summary table
ridership_attributes_month <- ridership_attributes %>%
filter(day_type == "Weekday") %>%
group_by(year_month) %>%
summarise(rides_total = sum(rides_total))Create plots showing ridership trends over time
The following code uses the “plotly” package to create charts showing trends in total (i.e., station boardings) by total (Figure 3) and average weekday (Figure 4) boardings by month between January 1999 through October 2023.
# create plot showing rail boardings per month
plot_ly(data = ridership_attributes_month
%>%
st_drop_geometry(),
x = ~year_month,
y = ~`rides_total`,
type = 'scatter',
mode = 'lines',
line = list(color = "#080967",
width = 0.5),
name = 'rides_total',
hovertemplate = 'Rides: %{y:,}<extra></extra>') %>%
layout(
xaxis = list(title = ""),
yaxis = list(title = "Total Rail Station Boardings"),
legend=list(
font = list(
size = 10
),
orientation = "h",
xanchor="center",
x = 0.5, y=-0.1),
hovermode = "x unified")# create plot showing weekday rail boardings
plot_ly(data = ridership_daily_boardings %>%
filter(day_type=="W"),
x = ~service_date,
y = ~`rail_boardings`,
type = 'scatter',
mode = 'lines',
line = list(color = "#9F1928",
width = 0.5),
name = 'rail_boardings',
hovertemplate = 'Rail boardings: %{y:,}<extra></extra>') %>%
layout(
xaxis = list(title = ""),
yaxis = list(title = "Average Weekday Rail Station Boardings"),
legend=list(
font = list(
size = 10
),
orientation = "h",
xanchor="center",
x = 0.5, y=-0.1),
hovermode = "x unified")Join CCVI and rail ridership with L station dataset
Now we perform an attribute join between the L stations, CCVI and ridership data. This will allow us to link trends in ridership by community area with CCVI scores and ranks.
rail_stations_ccvi <- rail_stations %>%
left_join(community_areas_ccvi %>%
st_drop_geometry() %>%
select(-comarea_name),
by="comarea_id")
rail_stations_ccvi_ridership <- ridership_attributes %>%
filter(day_type=="Weekday") %>%
select(station_id,
year_month,
rides_total) %>%
left_join(rail_stations_ccvi %>%
st_drop_geometry(),
by="station_id")Figure 5 below tabulates L stations by line and CCVI.
Import weekly COVID-19 cases data from the city of Chicago
The Chicago data portal also makes available COVID-19 case data summarized by week. These data can be plotted to identify different waves of the pandemic then compared to trends in rail usage. Use the following code to download and summarize the latest COVID-19 case data.
covid19_cases_raw <- read_csv("https://data.cityofchicago.org/api/views/naz8-j4nc/rows.csv?accessType=DOWNLOAD&bom=true&format=true")
covid19_cases <- covid19_cases_raw %>%
rename(Cases=`Cases - Total`,Deaths=`Deaths - Total`) %>%
mutate(Date = as.Date(Date, "%m/%d/%Y")) %>%
drop_na(Cases)
covid19_cases_summary <- covid19_cases %>%
mutate_if(is.numeric,list(~replace_na(.,0))) %>%
ungroup() %>%
arrange(Date) %>%
select(Date, Cases, Deaths) %>%
mutate(Cases7d = rollmean(x=Cases, k=7, fill=0, align="right"),
Deaths7d = rollmean(x=Deaths, k=7, fill=0, align="right"),
Casescm = cumsum(Cases), Deathscm = cumsum(Deaths), Geography="Chicago") %>%
pivot_longer(cols = c(Cases:Deathscm), names_to = "layer", values_to = "value") Compare trends in rail ridership (above) with COVID-19 case trends made available on the city of Chicago’s COVID-19 dashboard and/or the weekly case plot shown below (Figure 6). Create a customized version of the below plot by changing the filter option from weekly average cases (i.e., “Cases7d”) to weekly average deaths (i.e., “Deaths7d”).
# Create a line chart using the Cases7d variable
covidcases7davg_plot <- plot_ly(data = covid19_cases_summary %>%
filter(layer == "Cases7d"), # EDIT HERE
x = ~Date,
y = ~value,
type = 'scatter',
mode = 'lines',
line = list(color = "#9F1928",
width = 0.5),
name = 'COVID-19 Cases (7-day average)',
hovertemplate = 'COVID19 cases: %{y:,}<extra></extra>') %>%
layout(
xaxis = list(title = ""),
yaxis = list(title = "COVID-19 Cases (7-day average)"),
legend=list(
font = list(
size = 10
),
orientation = "h",
xanchor="center",
x = 0.5, y=-0.1),
hovermode = "x unified")
covidcases7davg_plotStep 6. Examine ridership by station characteristics
Figure 7 plots ridership trends by CCVI category (i.e., Low, Medium, and High) whereas Figure 8 shows ridership by CTA L Line.
# create plot showing rail boardings per month by station characteristics
rail_stations_ccvi_ridership_custom <- rail_stations_ccvi_ridership %>%
st_drop_geometry() %>%
group_by(year_month,
`CCVI Category`) %>%
summarise(rides_total = sum(rides_total)) %>%
drop_na(`CCVI Category`)
pal_ccvi <- c("#E2F11A", "#E0A100", "#9F1928")
pal_ccvi <- setNames(pal_ccvi, c("LOW", "MEDIUM", "HIGH"))
plot_ly(data = rail_stations_ccvi_ridership_custom,
x = ~year_month,
y = ~`rides_total`,
# split = ~`CCVI Category`,
color = ~`CCVI Category`,
colors = pal_ccvi,
mode = 'scatter',
hovertemplate = 'Rides: %{y:,}<extra></extra>') %>%
layout(
xaxis = list(title = ""),
yaxis = list(title = "Total Rail Station Boardings"),
legend=list(
font = list(
size = 10),
orientation = "h",
xanchor="center",
x = 0.5, y=-0.1),
hovermode = "x unified")Review Figure 8 below and identify two CTA rail lines that you’d like to explore further. Create a custom chart comparing the two L lines you selected by filtering data in the “station_line” column. Do this by first uncommenting the line beginning with “filter” function then modifying the lines being filtered (e.g., replace “BL” with “RD”). Copy the custom chart in the presentation.
# create plot showing rail boardings per month by station characteristics
rail_stations_ccvi_ridership_custom <- rail_stations_ccvi_ridership %>%
# filter(station_line == "BL" | station_line == "BR") %>% # EDIT HERE
st_drop_geometry() %>%
group_by(year_month,
station_line) %>%
summarise(rides_total = sum(rides_total)) %>%
drop_na(station_line)
# create color palette for rail lines
pal_line <- c("#3498db", "#964B00", "#1e8449","#424949","#e67e22","#f5b7b1","#e74c3c")
pal_line <- setNames(pal_line, c("BL", "BR", "GR", "ML", "OR", "PK", "RD"))
plot_ly(data = rail_stations_ccvi_ridership_custom,
x = ~year_month,
y = ~`rides_total`,
color = ~`station_line`,
colors = pal_line,
mode = 'scatter',
text = ~`station_line`,
hovertemplate = '%{text}: %{y:,}<extra></extra>') %>%
layout(
xaxis = list(title = ""),
yaxis = list(title = "Total Rail Station Boardings"),
legend=list(
font = list(
size = 10),
orientation = "h",
xanchor="center",
x = 0.5, y=-0.1),
hovermode = "x unified")Step 7. Make custom map in QGIS
Use the three geographic layers in your “data” directory to create a custom (but rather basic) map in QGIS showing the two lines (and associated stations) you selected in step 6. Copy the map into slide 5 of your presentation.